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Abstract — Within systems health management, prognostics fo- 
cuses on predicting the remaining useful life of a system. In 
the model-based prognostics paradigm, physics-based models 
are constructed that describe the operation of a system and 
how it fails. Such approaches consist of an estimation phase, 
in which the health state of the system is first identified, and a 
prediction phase, in which the health state is projected forward 
in time to determine the end of life. Centralized solutions to 
these problems are often computationally expensive, do not scale 
well as the size of the system grows, and introduce a single 
point of failure. In this paper, we propose a novel distributed 
model-based prognostics scheme that formally describes how to 
decompose both the estimation and prediction problems into 
independent local subproblems whose solutions may be easily 
composed into a global solution. The decomposition of the prog- 
nostics problem is achieved through structural decomposition 
of the underlying models. The decomposition algorithm creates 
from the global system model a set of local submodels suitable 
for prognostics. Independent local estimation and prediction 
problems are formed based on these local submodels, resulting in 
a scalable distributed prognostics approach that allows the local 
subproblems to be solved in parallel, thus offering increases in 
computational efficiency. Using a centrifugal pump as a case 
study, we perform a number of simulation-based experiments to 
demonstrate the distributed approach, compare the performance 
with a centralized approach, and establish its scalability. 

Index Terms — model-based prognostics, distributed prognos- 
tics, structural model decomposition 


Abbreviations & Acronyms 


EOL 

end of life 

PRMSE 

percent root mean square error 

RA 

relative accuracy 

RPM 

revolutions per minute 

RSD 

relative standard deviation 

RUL 

remaining useful life 

UKF 

unscented Kalman filter 

UT 

unscented transform 
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Notation 

x state vector 

6 parameter vector 

u input vector 

y output vector 

r performance requirement 
1Z set of performance requirements 
uj rotational velocity 

r torque 

p pressure or probability 

Q volumetric flow 

T temperature 

r friction coefficient 

w wear parameter 

Ai model/submodel 

v variable 

V voltage or set of variables 

X set of states 

0 set of parameters 

U set of inputs 

Y set of outputs 

c constraint 

C set of constraints 

e c equation of constraint c 
a causal assignment 

A set of causal assignments 

I. Introduction 

Systems health management is an engineering discipline 
that seeks to improve the design and operation of complex 
systems in the presence of faults and degradations. Prognostics 
is an essential technology for systems health management 
that centers on predicting the useful life of components, 
subsystems, or systems. This information may be used to slow 
damage progression, prolong system life, and optimize mainte- 
nance activities. Model-based prognostics approaches capture 
knowledge of how a system and its components fail through 
the use of physics-based models that describe the underlying 
physical phenomena [1] — [7]. These algorithms consist of two 
parts: ( i ) estimation, which computes the current joint state- 
parameter estimate of the system to determine the current 
health state, and (ii) prediction, which projects the current joint 
state-parameter estimate forward in time to determine end of 
life (EOL) and/or remaining useful life (RUL). 

To date, virtually all prognostics approaches employ a 
centralized architecture. However, centralized approaches have 
several drawbacks: they embody a single point of failure, 
are computationally expensive, and do not scale well as the 
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size of the system increases. Distributed architectures, on 
the other hand, offer several advantages. In particular, im- 
plementation platforms are becoming increasingly distributed, 
involving systems of smart sensors and smart components, in 
addition to multi-core processors [8]. Distributed approaches 
naturally take advantage of these new architectural paradigms, 
and hence improve scalability and computational efficiency. 
Distributed implementations on large systems are also easier 
to maintain when components are added or removed from the 
system. 

Specifically, we propose a novel distributed prognostics 
approach that exploits structural model decomposition [9]. In 
a model-based prognostics paradigm, the prognostics problem 
is defined by the underlying model. So, by decomposing 
the system model, we decompose the model-based prognos- 
tics problem. Several methods for structural model decom- 
position have, in fact, been developed for the purposes of 
diagnosis [10] — [13], but none for prognosis. In this work, 
we adopt the model decomposition framework developed 
previously in [14]. Like other structural model decomposition 
approaches, the key feature of the derived submodels is that 
they are computationally independent. Therefore, local prog- 
nostics problems based on the submodels can be solved in- 
dependently. As a result, solution of the subproblems requires 
no communication between the algorithms. This approach also 
provides more flexibility, allowing different algorithms to be 
applied to each subproblem, and, thus, each subproblem can 
be solved with the most appropriate strategy. The proposed 
approach to distributed prognostics developed in this paper is 
a fundamentally different approach from previous distributed 
prognostics approaches, e.g. [15], [16]. In such approaches, 
the global problem is still solved, and the computation is 
simply distributed, whereas in our approach, the approach 
is distributed by decomposing the global problem into local 
subproblems that are solved in parallel. 

In earlier work [17], preliminary results were presented in 
which only the estimation problem was decomposed using 
structural model decomposition as described in [10]. In this 
paper, we show how the more general model decomposition 
framework of [14] can be used to decompose both the esti- 
mation and prediction problems for model-based prognostics. 
The work of [14] shows how the estimation and prediction 
problems can be decomposed, however, it does not provide 
any algorithms for distributed prognostics. In this paper, we 
develop a distributed prognostics architecture based on the 
derived submodels, and includes the algorithms for distributed 
prognostics. The model decomposition corresponds to a set of 
local estimation and prediction problems that are smaller, and, 
therefore, easier to solve and require less computation than the 
global problem. Each local estimator computes a local joint 
state-parameter estimate. The local predictors use the outputs 
of the local estimators as inputs to their prediction routine, 
yielding local EOL/RUL predictions. Global EOL/RUL pre- 
dictions are then determined based on the local EOL/RUL 
predictions. 

We demonstrate our distributed prognostics methodology on 
a centrifugal pump that is used for liquid oxygen transfer in 
spacecraft fueling operations at Kennedy Space Center [2] . We 


derive submodels for local estimation and for local prediction, 
forming a distributed prognostics architecture for the pump. 
We perform a number of simulation-based experiments, and 
show that our distributed approach performs comparably to 
the centralized approach in terms of accuracy and precision 
of the life predictions, and at decreased computational cost. 
In addition, since the value of a distributed approach is only 
readily apparent with large-scale systems, we demonstrate the 
improved scalability of the distributed approach using a large- 
scale system composed of multiple pumps. 

The contributions of the paper are as follows: (i) a novel 
distributed model-based prognostics framework based upon 
structural model decomposition, including algorithms for dis- 
tributed estimation and prediction and the merging of local 
results into global results; (ii) the application of the distributed 
prognostics approach to a centrifugal pump with comprehen- 
sive simulation-based experimental results that validate the 
approach and compare to a centralized approach; and (Hi) 
a proof that the distributed approach is more efficient and 
scalable than the centralized approach, with corroborating 
experimental results. 

The paper is organized as follows. Section II formally de- 
fines the prognostics problem, describes the centralized prog- 
nostics architecture, and introduces our proposed distributed 
architecture. Section III describes the modeling methodol- 
ogy and develops the centrifugal pump model. Section IV 
overviews the model decomposition framework. Section V 
formulates the distributed prognostics problem using model 
decomposition, and provides an architecture for the pump. 
Section VI describes distributed estimation, and Section VII 
describes distributed prediction. Section VIII provides results 
from simulation-based experiments, evaluates the approach, 
and shows its scalability. Section IX compares our approach 
with related work, and Section X concludes the paper. 

II. Model-based Prognostics 

In this section we first formulate the model-based prognos- 
tics problem. We then describe the typical centralized prog- 
nostic architecture, followed by our proposal for a distributed 
architecture. 

A. Problem Formulation 

The goal of prognostics is the prediction of the EOL and/or 
RUL of a system. We assume the system model may be 
generally defined as 

x(t) = f {t, x(i), 0(t),u(t),v(t)), 
y(t) = h(t, x(t), 0(t), u(f), n(f)), 

where x(f) £ R nx is the state vector, 0(t) £ R™ 0 is the 
unknown parameter vector, u (t) £ R n “ is the input vector, 
v(t) £ R"” is the process noise vector, f is the state 
equation, y (t) £ R"« is the output vector, n(f) £ R”" is 
the measurement noise vector, and h is the output equation. 1 

In prognostics, we are interested in the time at which the 
performance of a system lies outside some desired region 

1 Bold typeface denotes vectors, and n a denotes the length of a vector a. 
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of acceptable behavior. Outside this region, we say that the 
system has failed. The desired performance is expressed 
through a set of n r requirements, 1Z = { where 
Ti : l" 1 xl 11 ® xK"“ —x B maps a given point in the joint state- 
parameter space given the current inputs, (x(t), 0{t), u (£)), to 
the Boolean domain B = {0, 1}, where rj(x(f), 0(f), u(f)) = 
1 if the requirement is satisfied, and 7"j(x(f), 0(f), u(f)) = 0 
if the requirement is not satisfied. 

These individual requirements are combined into a single 
threshold function Teol '■ R"* x JR" 0 —X B, defined as 


T EO l (x(f), 0(f)i u(f)) 


1, 0 e {rj(x(f), 0(f), u(f))}"T x 
0, otherwise. 


That is, Teol evaluates to 1, i.e., the system has failed, when 
any of the requirements are violated [2]. EOL is then defined 
as 


EOL(t P ) = 

inf{f S R. : f > t P A T E OL(x-(t),0(t), u(f)) = 1}, 

i.e., EOL is the earliest time point at which Teol is met. RUL 
is expressed using EOL as 

RULlfp) = EOL(t P ) -t P . 


B. Centralized Architecture 

In order to compute EOL, we need the current state of 
the system, which is unknown. Therefore, in the model-based 
prognostics paradigm, the problem of predicting EOL/RUL 
is split into two sequential problems: ( i ) estimation, which 
computes the state-parameter estimate, and (if) prediction, 
which simulates the current joint state-parameter estimate 
forward in time to determine EOL/RUL [1], [2], [4]. 

The centralized architecture implementing the model-based 
prognostics approach works as follows. In discrete time k, 
the system receives inputs u k and provides measured outputs 
y k . With the system model, the estimation module uses this 
information to compute an estimate p(x k , 0fc|yo : fc), accounting 
for the presence of process noise v(f) and sensor noise n(f). 
Given this state-parameter estimate, the prediction algorithm 
uses the model to simulate this distribution out to EOL 
to compute p(EOL kp \y 0:kp ) and p{RUL kp \y 0:kp ) at given 
prediction times kp. In order to do this, the prediction step 
must hypothesize the future inputs to the system u k for 
k > kp. 

The centralized approach solves the global prognostics 
problem by solving global estimation and prediction problems. 
Centralized approaches, however, introduce a host of potential 
problems. Aside from the fact that most modern computa- 
tional architectures are distributed, be it through multi-core 
processors or networked systems, the most significant issue is 
scalability. As the size of the problem increases, the methods 
to solve it become more and more costly, and can suffer from 
problems such as convergence of the estimates. We therefore 
propose a distributed approach that solves the global problem 
through a set of local subproblems. 


C. Distributed Architecture 

The key idea of the distributed approach is to decompose 
the global model into a set of local submodels, with each local 
submodel defining a local estimation or prediction subproblem. 
For a given model, we generate a set of submodels with 
local variables x* C x, 9 l C 0, u 1 C u, y l C y, 
TZ l C 7 Z, and with local equations f ' , h\ and T' FOI . Once 
the submodels have been defined, the distributed architecture 
works as follows. In discrete time k, the system is provided 
with inputs u/,. and provides measured outputs y k . The inputs 
u k and the outputs y k are split into local inputs u(. and 
outputs y l k . Local estimators compute p(xj., 0 l k |yg. fe ). From 
the local estimates, the inputs to the local predictors are 
constructed. The local predictors compute local EOL/RUL 
predictions p(EOLl p |yj :fcp ) and p(RUL\ p |yj,. fep ) at given 
prediction times kp. Local predictions are then merged into 
global predictions p(EOL kp \y 0:kp ) and p{RUL kp \y 0:kp ). 

Models are decomposed by selecting some set of variables 
as local inputs in addition to the global inputs u to form u\ 
In this way, we can derive submodels that can be computed 
independently of other submodels given the local inputs. This 
means that local estimation and prediction subproblems are 
independent and can be solved in parallel without communi- 
cation. For the estimation phase, the measured sensor signals 
can be used as additional inputs, exploiting the redundancy 
they provide [18]. For the prediction phase, the insight is to 
select, as local inputs, variables that can be predicted a priori 
(e.g., a controlled quantity). The sets of local inputs chosen for 
estimation and prediction may be different, in which case the 
resulting submodels used for estimation and prediction will 
be different, so the estimates required for prediction must be 
reconstructed from the results of the local estimators. 

Because the subproblems are smaller than the global prob- 
lem and can be solved in parallel, this approach is more 
efficient and scalable than a centralized approach, as will be 
shown in Section VIII. However, the distributed approach does 
have some limitations. For the estimation phase, information 
will be lost due to the decomposition (specifically, the covari- 
ance of decoupled variables), and noisy sensor signals will be 
used as local inputs. Therefore, the distributed approach will 
not obtain the same exact answer as the centralized approach. 
We will show in Section VIII that, despite these limitations, 
the performance of the distributed approach is comparable to 
the performance of the centralized approach. Before presenting 
the details of the distributed prognostics approach, in the next 
section, we introduce the pump case study and its model. 

III. Centrifugal Pump Modeling 

In this work, we use a centrifugal pump as a case study. The 
particular pump under study is used to transfer liquid oxygen 
for spacecraft fueling operations, the model for which was 
originally presented in [2]. In practice, distributed prognostics 
may not be warranted for a single component, but we use the 
pump as a case study here because it is a complex but small 
enough system to fully describe and demonstrate our approach. 
In Section VIII, we will investigate the scalability properties 
of our approach using a large-scale multi-pump system. 
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Outlet 



Fig. 1. Centrifugal pump. 


In order to apply model-based prognostics, we must develop 
a model of the system under consideration. This includes 
identifying the state vector x(t), the parameter vector 9(t), the 
input vector u(f), the output vector y(f), the state equation f, 
the output equation h, and the set of performance requirements 
1Z. In this section, we summarize the main features of the 
pump model, first describing the nominal model, and then 
describing its damage progression models. 


A. Nominal Model 


Centrifugal pumps are used for the delivery of fluids in a 
system. A schematic is shown in Fig. 1 . Fluid enters the inlet, 
and the impeller rotation, driven by an electric motor, forces 
it through the outlet. Bearings help minimize friction along 
the shaft, and are lubricated by oil residing in the bearing 
housing. The pump state includes u(t), the rotational velocity 
of the pump; Q(t), the discharge flow; T t (t), the thrust bearing 
temperature; T r (t) the radial bearing temperature; and T a (t), 
the oil temperature. 

The rotational velocity of the pump is described using a 
torque balance, 

= j (r e - ruj - t l ) , (1) 

where J is the lumped motor/pump inertia, r e is the electro- 
magnetic torque provided by the motor, r is the lumped friction 
parameter, and tx is the load torque. We assume the pump is 
driven by an induction motor, in which torque is produced 
only when there is a slip, s, between the synchronous speed 
of the supply voltage, w s and the mechanical rotation, oj: 


s = 


UJ S — U) 
LO s 


( 2 ) 


The expression for the torque r e is derived from an equivalent 
circuit representation for a three-phase induction motor [19]: 

t = npRv X! ^ 

su } 3 (f?i + R 2 / s ) 2 + {uj s Li + uj s L 2 ) 2 

where Ri is the stator resistance, L \ is the stator inductance, 
f ?2 is the rotor resistance, L 2 is the rotor inductance, n is the 
number of phases (typically 3), p is the number of magnetic 
pole pairs, and V is the applied rms motor voltage. The 
dependence of torque on slip creates a feedback loop that 
causes the rotor to follow the rotation of the magnetic field. 
Rotor speed is controlled by changing uj s , e.g., through the 


use of a variable-frequency drive, which will also change V 
to manage power usage. 

Load torque tl is a polynomial function of the pump flow 
rate and the impeller rotational velocity [20], [21]: 

t l = aoio 2 + aiuQ - a 2 Q 2 , (4) 

where ao, ai, and a 2 are coefficients derived from the pump 
geometry [21]. 

The rotation of the impeller creates a pressure difference 
from the inlet to the outlet of the pump, which drives the 
pump flow, Q. The pump pressure is computed as 

p p = b 0 uj 2 + biUjQ - b 2 Q 2 , (5) 

where b 0 , bi, and b 2 are coefficients derived from the pump 
geometry. The parameter bo is proportional to impeller area 
A [22]. Flow through the impeller, Qi, is computed using the 
pressure differences: 

Qi = cyj | p s +p p -pd\ sign(p s + p p - Pd ), (6) 

where c is a flow coefficient, p s is the suction pressure, and 
Pd is the discharge pressure. To account for fluid inertia, the 
discharge flow is described by 

Q = -j~{Qo ~ Qi), ( 7 ) 

J Q 

where Jq is the flow inertia. 

Pump temperatures are often monitored as indicators of 
pump condition. The oil heats up due to the radial and thrust 
bearings and cools to the environment: 

To = ( H 0il {T t - T 0 ) + H 0 , 2 (T r - To) - H 0 , 3 (T 0 - T a )), 

’Jo 

( 8 ) 

where J a is the thermal inertia of the oil, and the //„ , terms 
are heat transfer coefficients. The thrust bearings heat up due 
to the friction between the pump shaft and the bearings, and 
cool to the oil and the environment: 

T t = ^(r t cc 2 - H ul {T t - T 0 ) - H ta {T t - T a )), (9) 

•h 

where J t is the thermal inertia of the thrust bearings, rt is 
the friction coefficient for the thrust bearings, and the // t , 
terms are heat transfer coefficients. The radial bearings behave 
similarly: 

f r = jr (rvw 2 - H rA {T r - To) - H r , 2 (T r - T a )), (10) 

where J r is the thermal inertia of the radial bearings, r r is the 
friction coefficient for the radial bearings, and the H r i terms 
are heat transfer coefficients. 

The overall input vector u is given by 

U W = [Ps(t) Pd(t) T a (t) V(t) w s (f)] T . (11) 

The available pump sensors form the measurement vector 
y given by 

y(t)=[w(t) Q(t) To(t) T t (t) T r (t)\ T . (12) 
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TABLE I 

Nominal Pump Parameters 


Input Voltage 


Rotational Velocity 


460 
440 
, 420 
400 
380 


0 


__ 014 
"1 0-13 
| 0.12 
0.11 


1 2 
Time (hours) 
Discharge Flow 


0 12 3 

Time (hours) 

Radial Bearing Temperature 



460 

440 

420 

400 

380 

360 


0 12 3 

Time (hours) 

Thrust Bearing Temperature 




Fig. 2. Nominal pump operation. 

Fig. 2 shows nominal pump operation, with the parameters 
given in Table I. The input voltage and line frequency are 
varied to control the pump speed (commanded line frequency 
is equal to the observed pump speed). Initially, slip is 1 so 
an electromagnetic torque is produced, causing a rotation of 
the motor to match the rotation of the magnetic field, with 
a small amount of slip remaining. The pump rotation creates 
fluid flow and heats up the bearings. 

B. Damage Modeling 

The performance requirements of the pump are specified by 


efficiency and temperature limits: 

r]>V~ (13) 

T 0 < T+ (14) 

T t < T+ (15) 

T r < T+, (16) 


where the ~ superscript denotes a minimum and the + 
superscript denotes a maximum, and efficiency 77 is defined 
as t) = (j, d Yp j q for nominal inputs ( I is rms motor current). 
We take 77 ” = 0 . 75 ? 7 o, where 770 is the nominal efficiency. 
When the maximum temperatures are reached, irreversible 
damage occurs. Here, we use 7'+ = 333 K, T t + = 370 K, 
and T+ = 370 K. 

The most significant damage mechanism for pumps is 
impeller wear. It is represented as a decrease in impeller area 
A [22], [23]. Since the impeller area is proportional to bo, 
a decrease in impeller area causes a decrease in the pump 
pressure, and, hence, the pump efficiency. We use the erosive 


Parameter 

Value 

w(0) 

376 rad/s 

J 

50 kg m 2 

r 

8.0 x 10 -3 N m s 

n 

3 phases 

p 

1 pole pair 

Ri 

3.6 x 10 _1 H 

Rri 

7.6 x io~ 2 n 

Li + L 2 

6.3 x 10" 4 H 

Q{ 0) 

0 m 3 /s 

CIO 

1.5 x 10 -3 kg m 2 

ai 

5.8 kg/m 

d2 

9.2 x 10 3 kg/m 4 

60 (0) 

12.7 kg/m 

6l 

1.8 x 10 4 kg/m 4 


0 kg/m 7 

C 

8.2 x 10 _s m 7/2 /kg 1/2 

Cl 

1.0 x 10- 10 m 7/2 /kg 1/2 

Jq 

5.0 s” 1 

To( 0) 

290 K 

Jo 

8.0 x 10 3 K/J/s 

Ho A 

1.0 W/K 

Ho , 2 

3.0 W/K 

Ho, 3 

1.5 W/K 

T r { 0) 

290 K 

Jr 

2.4 K/J/s 

r r ( 0) 

1.8 x 10 -6 N m s 

H r , r 

1.8 x 10 -3 W/K 

H T , 2 

2.0 x 10 -2 W/K 

Tt( 0) 

290 K 

Jt 

7.3 K/J/s 

rt( 0) 

1.4 x 10 -6 N m s 

77/:. 1 

3.4 x 10 -3 W/K 

H t ,2 

2.6 x 10” 2 W/K 


wear equation [24] to describe how the impeller area changes 
over time. The erosive wear rate is proportional to fluid 
velocity times friction force. Fluid velocity is proportional to 
volumetric flow rate, and friction force is proportional to fluid 
velocity. We lump the proportionality constants into the wear 
coefficient vja to obtain [2] 

A = - w A Ql (17) 

Because A is proportional to b 0 , then h 0 = kA = — kwAQ 2 , 
so we estimate bo and Wb 0 = kwA- 

Another significant damage mechanism for pumps is bear- 
ing wear, which is captured as an increase in the friction co- 
efficient. Sliding and rolling friction generate wear of material 
which increases the coefficient of friction [2], [3], [24]: 

h = w t r t uj 2 , (18) 

f r = w r r r ur , (19) 

where wt and w r are the wear coefficients. The slip com- 
pensation provided by the electromagnetic torque generation 
masks small changes in friction, so it is only with very large 
increases that a change in u> will be observed. Changes in 
friction manifest more strongly in the bearing temperatures, 
eventually driving them to the temperature limits. 
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So, the full state vector is 

x(i) = [w(t) Q(t) T a (t ) T t (t ) T r (t) b 0 (t) r t (t ) r r (f)] J . 

(20) 

The initial conditions for bo, rt, and ry are given in Table I. 
The wear parameters form the unknown parameter vector, i.e., 

Q{t) = [w bo w t w r ] T . (21) 

IV. Model Decomposition 

To decompose the problem of model-based prognostics, we 
decompose the underlying structural model. We use the model 
decomposition framework described in [14], but simplify it, 
without loss of generality, by removing the notion of auxiliary 
variables (intermediate variables derived from the states, pa- 
rameters, and inputs). This simplifies the model decomposition 
algorithm and allows us to make guarantees of the minimality 
of the derived submodels. 

We introduce the requisite notation and concepts of the 
model decomposition framework in the following. Additional 
details and the full version of the framework can be found 
in [14]. We begin with the definition of a model. 

Definition 1 (Model). A model A4* is a tuple M* = ( V. , C), 
where V is a set of variables, and C is set of constraints. V 
consists of four disjoint sets, namely, the set of state variables, 
X\ the set of parameters, 0; the set of inputs, U\ and the set 
of outputs, Y. Each constraint c = (e c , V c ) £ C consists of 
an equation s c involving variables V c € V. 

Input variables u £ U are known or measured, and cor- 
respond to the input signals u(t). The subset of the outputs 
corresponding to the (measured) sensor signals y (f) are de- 
noted as Y* C Y . Parameters 6 £ 0 include explicit model 
parameters corresponding to 0(f) that are used in the model 
constraints. 0 consists only of those parameters that are to be 
made explicit for joint state-parameter estimation. 

As shown in Defn. 1, a constraint c = (e c , V c ) includes 
an equation e c over the set of variables V c . These constraints 
are essentially representative of the vector functions f and h, 
along with the requirements 1Z. We associate explicit variables 
for the evaluation of the performance requirements, e.g., e,; = 
ry(x(t), 0(f), u(t)). Note that, typically, a given ry is only a 
function of a subset of the states, parameters, and inputs. Here, 
the ei variables become part of Y, and are not included in 
Y*. We denote by E C Y the variable set associated with the 
performance requirement evaluations. 

For the pump model, we have the variable sets X = {w, 
Q , T 0 , T t , T r , b 0 , r t , ry}, 0 = {uy 0 , w t , w r }, U = {p s , p d , 
T a , V, cu s }, and Y = {cu*, Q* , T*, T* , T* , e lt e 2 , e 3 , e 4 }. 
Here, Y* = {to*, Q* , T*, T t * , T *} and E = {e u e 2 , e 3 , e 4 }, 
where the variables e 4 to e 4 correspond to the requirements 
described in Eq. 13 to 16. The * superscript is used on output 
variables that are associated with sensors. 

The notion of a causal assignment is used to specify the 
computational causality for a constraint c, by defining which 
v £ V c is the dependent variable in equation e c . 


Definition 2 (Causal Assignment). A causal assignment a 
to a constraint c = (s c ,V c ) is a tuple a = (c,v° ut ), where 
v° ut £ V c is assigned as the dependent variable in e c . 

We write a causal assignment of a constraint using its 
equation in a causal form, with := to denote explicitly the 
causal (i.e., computational) direction. 

We say that a set of causal assignments A, for a model JA* 
is valid if 

• For all v £ U U 0, A does not contain any a such that 

a = (c, v). 

• For all v £ Y, A does not contain any a = (c,v° ut ) 
where v £ V c — {w™ 1 }. 

• For all v £ V—U—&, A contains exactly one a = ( c , v). 
A causal model is a model extended with a valid set of 

causal assignments. 

Definition 3 (Causal Model). Given a model M* = ( V, C ), 
a causal model for A4* is a tuple jYl = (V,C,A), where A 
is a set of valid causal assignments for A4*. 

For the pump model, the causal constraints are as follows. 
For the states, we have 


“ := J 

f gj dt , 
0 

r 1 . 

(at) 

Q:=j 

' Q dt, 
0 

ft . 

(a 2 ) 

T °'-=l 

' T 0 dt, 
0 
r t 

(«3) 

T ‘ := J 

o 

(a 4 ) 

Tr :=j 

f T r dt, 
0 

ft 

(ct 5 ) 

b ° [ =J 

' -WboQi dt 
0 

ft 

(«e) 

r,:=J 

1 f t dt, 
0 

ft 

(«r) 


' v r dt, 
0 

(as) 


where u ; is given by Eq. 1, Q by Eq. 7, T a by Eq. 8, T) by 
Eq. 9, T r by Eq. 10, Qi by Eq. 6, rt by Eq. 18, and ry by 
Eq. 19. The initial conditions are provided in Table I. For the 


outputs, we have 

w* := to («9) 

Q* := Q, («io) 

T: := To, (an) 

T t * := T t , (a 12 ) 

T; := T r . (a 13 ) 

For the performance requirements, we have 

ei := {t] > r]~), (a 44 ) 

e 2 := ( T a < To), (ctis) 

£3 '■= (T t < T t + ), (<Tl6) 

e 4 := (T r < T+). (a 47 ) 
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Fig. 3. Causal graph for the pump model. 


We visualize a causal model M using a directed graph Q = 
(N,A), where N is the set of nodes corresponding directly 
to the variables F in M. and A is the set of arcs, where 
for every (c,v° ut ) £ A, we include an arc (v 1 ,v° ut ) for each 
v' £ V c — {v° ut }. The causal graph corresponding to the pump 
model is given in Fig. 3. In the graph, we mark inputs with 
dashed circles and states with dashed squares. 

In order to decompose a model into submodels, we need to 
break internal variable dependencies. We do this by selecting 
certain variables as local inputs. Given the set of potential local 
inputs (in general, selected from F) and the set of variables 
to be computed by the submodel (selected from V — U — 0), 
we create from a causal model M a causal submodel Ai,, in 
which a subset of the variables in F are computed using a 
subset of the constraints in C. In this way, each submodel 
computes its variable values independently from all other 
submodels. Further, if the local input values are exactly the 
same as the corresponding variables in the global model, the 
values of local outputs for the submodel will exactly reproduce 
the values of the corresponding variables in the global model. 
A causal submodel can be defined as follows. 

Definition 4 (Causal Submodel). A causal submodel Mi of 
a causal model M = (F C , .4) is a tuple Mi = (F, Ci,A%), 
where F Q V, C t C C, and Ai (T A ^ 0. 

When using outputs (from Y*) as local inputs, the causality 
of these constraints must be reversed, and so, in general, Aj 
is not a subset of A. All remaining causal assignments in A, 
will still be found in A. 

The procedure for generating a submodel from a causal 
model is given as Algorithm 1. Given a causal model M, a set 
of variables that are considered as local inputs U*, and a set 
of variables to be computed F*, the GenerateSubmodel 
algorithm derives a causal submodel Mi that computes F* us- 
ing U*. We provide here a simplified version of the algorithm 
presented in [14], and refer the reader to [14] for the extended 
algorithm and additional details. We briefly summarize the 


Algorithm 1 AT; = GenerateSubmodel(Af , U*, V*) 
1: Vi <- V* 

2: Ci 4- 0 
3: Ai <- 0 
4: variables 4— V * 

5: while variables ^ 0 do 
6: of- pop (variables) 

7: cf- GetBestConstraint(r), F, U* , A) 

8: Ci 4- Ci U {c} 

9: Ait- Ail) {(c,w)} 

10: for all v' £ V c do 

11: if v 1 £ Vi and v’ 0 and v' ^ U* then 

12: variables 4— variables U {i/} 

13: end if 

14: Vi <- Vi U {«'} 

15: end for 

16: end while 
17: Mi 4- ( Vi,Ci,Ai ) 


algorithm below. 

In Algorithm 1, the variables queue represents the set of 
variables that have been added to the submodel but have not 
yet been resolved, i.e., they cannot yet be computed by the 
submodel. This queue is initialized to F*, the set of variables 
that must be computed by the submodel. The algorithm then 
iterates until this queue has been emptied, i.e., the submodel 
can compute all variables in F* using only variables in 
U*. For each variable v that must be resolved, we use the 
GetBestConstraint subroutine (Subroutine 2) to find the 
constraint that should be used to resolve v in the minimal (in 
the number of constraints) way. 

The GetBestConstraint subroutine (simplified 
from [14]) tries to find a constraint that completely resolves 
the variable, i.e., resolves v without further backward 
propagation (all other variables involved in the constraint 
are in F U 0 U U* ). Such a constraint may be the one that 
computes v in the current causality, if all needed variables are 
already in the submodel (in F) or are available local inputs 
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Subroutine 2 c = GetBestConstraint(i>, Vi, U* , A) 
1 : c v <s— find c where (c, v) £ A 
2: if (V Cv - M) C Vi U U* then 
3: return c„ 

4: else 

5: for all y £Y* C\U* do 

6: c y <s— find c where (c, y) £ A 

1: if v € V Cy and (V Cy — {u}) C V; U I/* then 

S: return c v 

9: end if 

10: end for 

1 1 : end if 
12: return c„ 


(in [/*); or such a constraint may be one that computes a 
measured output y* £ U*, in which case the causality will be 
modified such that y* becomes an input, i.e., the constraint in 
the new causality will compute v rather than y* . If no such 
constraint exists, then the constraint that computes v in the 
current causal assignment is chosen, and further backward 
propagation will be necessary. 

For example, consider generating a submodel for the pump 
with U* = { V. , co s , Q*} and V* = {w*}. We first try to resolve 
co* (see Fig. 3). To compute co* in the given causality we 
need to include co in the submodel. To compute co we need 
to include V, co s , Q, and co. Both V and co s are in U* , and 
co is in V, so these variables are resolved. To compute Q, 
we have two options: compute using p^, bo, p s , Q, and co, 
or compute using Q* with the corresponding constraint in the 
causality such that Q is computed. The minimal resolution is 
the second option, so Q* is added to the submodel and the 
causality of the involved constraint is modified. Since Q* is 
in U* , it is resolved. Now all variables in the submodel are 
resolved and the algorithm is complete. 

Clearly, there are many submodels that compute any given 
V* using a given U*. The global model is one such solution. 
Algorithm 1 finds a minimal submodel that satisfies this, which 
is guaranteed in Subroutine 2 by resolving a variable without 
further backward propagation whenever possible. There may 
be multiple submodels that are equally minimal (i.e., due to a 
choice of which local input to use), and the algorithm returns 
the first that it finds. 

The algorithm also generates only complete submodels, i.e., 
the submodels contain at least the variables needed to compute 
its V*. This is guaranteed because the algorithm only stops 
propagation at variables included in V U 0 U U* [14]. 

In the worst case, the algorithm must visit all variables and 
constraints. On each variable. Subroutine 2 is called, which in 
the worst case considers all variables in Y 0(7*. So the overall 
worst-case time complexity is 0((|V| + |i£|) • |Ffl(7*|). Since 
(TTl[/*) C V, the algorithm is polynomial in the model size. 
On average some amount of decomposition will be possible 
so the complexity will be much lower in practice. 

In the next section we describe how this model decomposi- 
tion algorithm is used to decompose a model for the explicit 
purposes of distributed estimation and distributed prediction. 


The distributed model-based prognostics architecture is 
based on structural model decomposition, with local estimation 
and prediction subproblems based on derived local submodels. 
For estimation, we construct minimal submodels, one for 
each output of the model that corresponds to a sensor, i.e., 
for each y* £ Y*. As discussed in Section II-C, we use 
measured sensor signals as local inputs in addition to U. 
For each output y* £ Y* , we create a submodel using 
GenerateSubmodel(Af , U U (Y* — { y *}), {y*}), i.e., we 
use as local inputs the inputs to the global model along with all 
sensor outputs except for y* , and the only local output is y*. 
We define a local estimator based on that local submodel (e.g., 
Kalman filter, unscented Kalman filter, particle filter, etc.). 

Using noisy sensors as local inputs to the estimation sub- 
problems may, of course, result in a loss of accuracy and 
robustness of the local estimators. This is the cost of deriving 
independent local estimators. Without sensor noise, the local 
estimators would produce the same results as a centralized 
estimator, and as noise is added, performance may degrade. 
Note of course that the centralized estimator must deal also 
with sensor noise and so its performance will degrade as 
well. In Section VIII we investigate the effects of increased 
sensor noise on estimation performance for both the distributed 
and centralized cases. In the situation where some sensors 
are unreliable or extremely noisy, they can be removed from 
the set of local inputs. To improve robustness, multi-output 
estimators can also be derived, instead of the proposed single- 
output estimators, but setting V* C Y* (the global model can 
be recovered by setting V* = Y*). 

The causal graphs for the resulting submodels for the pump 
are shown in Fig. 4. We obtain five submodels. The submodel 
for co* has X = {w} and 0 = 0; the submodel for Q* has 
X = { Q , bo } and 0 = {wy , } ; the submodel for T* has X = 
{T a } and 0 = 0; the submodel for T* has X = { T t , rt} and 
0 = {w; t }; and the submodel for T* has X = {T r ,r r } and 
0 = { w r } . Note that the estimation submodels do not contain 
the performance requirements, as these are not required to 
compute the outputs, so are not included in the submodels 
derived by the decomposition algorithm. 

If the measurement set changed, then the resulting local sub- 
models for estimation would change also. For example, con- 
sider a reduced measurement set of Y* = {w*, Q* , T* , T*}, 
i.e., T* is no longer measured. Then only four submodels 
would result, one for each sensor. The submodels for w* and 
Q* remain the same, but since T* can no longer be used 
as a local input, the T t * and T* submodels would have to 
include the T a state and instead use T* and T* as local inputs, 
respectively. The corresponding causal graphs are shown in 
Fig. 5. 

Prediction requires hypothesizing future inputs to the sys- 
tem. Therefore, when selecting local inputs for model de- 
composition, we can select only those variables that can be 
predicted a priori. For example, for the pump, we can use 
co as a local input because it is a controlled variable, and 
we know what the future controlled values are. On the other 
hand, T r cannot be used, since it is evolving due to rt, which 
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(a) Causal graph for u estima- (b) Causal graph for Q estimation submodel, 
tion submodel. 
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(c) Causal graph for T 0 estimation sub- (d) Causal graph for Tt estimation submodel. (e) Causal graph for T r estimation submodel, 
model. 


Fig. 4. Causal graphs for pump estimation submodels. 


is changing in time depending on Wt, and T a , which is in 
turn affected by T t and r t which are changing in time due 
to w t- The fault propagation among the pump temperatures 
prohibits the use of any of the temperature variables being 
used as local inputs. If no variables exist that can be predicted 
a priori outside of U, then the prediction problem cannot be 
decomposed, and the global model must be used for prediction. 

Besides U, local inputs can come from X or Y. In 
some cases, it is advantageous to add additional “virtual” 
outputs if these variables can be predicted a priori but are 
not already in X or Y, to include in U*. We construct 
for each performance requirement a submodel that evaluates 
the requirement. Because EOL is reached when any one of 
the performance requirements are violated, we can evaluate 
them independently to obtain local EOL distributions and 
then take the minimum to get the global EOL distribution. 
For each variable e £ E, we create a submodel using 
GenerateSubmodel(.A/l , Up, {e}), where Up C X U U U 
Y D U is the set of variables that can be predicted a priori. 

For the pump model. Up consists of U and w, as just 
mentioned. The causal graphs for the resulting submodels are 
shown in Fig. 6 . We obtain one submodel associated with the 
efficiency requirement, and three submodels associated with 
the temperature requirements. For the temperature require- 
ments, however, aside from the performance requirement in- 
cluded, each submodel is exactly the same (e.g., the submodel 
for e -2 is that corresponding to the causal graph in Fig. 6 b with 
the e 3 and e 4 variables and constraints removed). This means 


that the temperatures cannot be decomposed (due to the fault 
propagation between them). Therefore, we merge these three 
submodels into one submodel, to avoid unnecessary computa- 
tion, using GenerateSubmodel(A^, C/p, {e 2 , e 3 , 64}). 

As in the centralized scheme, the prediction algorithm 
uses the state-parameter estimates as input. So, the required 
estimates must be constructed from the local estimates of the 
submodels used for estimation. A prediction submodel has a 
set of states Xi and parameters (-),, and must construct a local 
distribution p(xj,, 0\.\y l 0 . k ) from the estimates provided by the 
local estimators. To do this, we assume that the local state- 
parameter estimates may be sufficiently represented by a mean 
H‘ and covariance matrix S'. For each prediction submodel 
i, we combine the estimates from estimation submodels that 
estimate states and parameters in X, U 0, into //' and covari- 
ance S'. If there is overlap in the state-parameter estimates, 
i.e., if two submodels both estimate the same state variable x 
or parameter 9, then we take the average value for common 
means and covariances (alternate strategies may also be used). 
Some covariance information lost due to the decoupling will 
appear as zeros in the recovered covariance matrix. Each 
prediction submodel i computes a local EOL/RUL distribution, 
i.e., p(EOLl p \ yi kp ) and P {RUL\ p |y* :fcp ). The global EOL 
is determined by the minimum of all the local distributions, 
since Teol is 1 whenever any of the local constraints are 
violated. 

The distributed prognostics architecture for the pump is 
shown in Fig. 7. Here we had derived five submodels for the 
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Fig. 7. Distributed prognostics architecture for the pump. 


W t 


CD 

A 


-*i r t \ ►; T t \ ► T, 


CO 


T 0 'f 


T,. < TV 


(a) Causal graph for Tt estimation submodel. 


T t < 


T t 


co 

▼ 

CO 




(a) Causal graph for flow prediction submodel. 


Wf 


* 

CO ► 


CO 


& 


r t 




e3 


e 2 


e 4 


w r 


w r 


(b) Causal graph for T r estimation submodel. (b) Causal graph for temperature requirements submodel. 

Fig. 5. Causal graphs for pump estimation submodels when T* is not Fig. 6. Causal graphs for pump prediction submodels, 
measured. 


purposes of estimation, and two for prediction. We find that 
the submodel estimating w* is actually not needed, because 
the only state it estimates is uj (see Fig. 4a), and this state is 
not required by any of the prediction submodels (see Fig. 6). 
Therefore, we need only four submodels on which to base our 
local estimators, that estimates Q*, M 2 E that estimates 
T*, that estimates T*, and that estimates T*. For 


prediction, we have A4p that predicts the violation of the 
efficiency requirement, and M 2 p that predicts the violation 
of the temperature requirements. 

The global inputs and outputs are first split into the local 
inputs and outputs based on the Ui and Y t of the submodels 
derived for estimation. For example, A4 P uses as inputs p s , 
Pd, and of (the measured value of ui), and computes a single 
output, Q*. The local estimates are computed. Mi 2 p builds its 
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local state using the estimates of T 0 from the A4 2 E estimator, 
Wt, r t , and T t from the AA% estimator, and try, ry, and T r 
from the A4 E estimator. The local predictors compute the local 
EOL/RUL predictions, with the predictor for A4j, computing 
the EOL for the efficiency requirement, and the predictor for 
A dp computing the EOL for the temperature requirements. 
The local predictions are then merged into the global predic- 
tion. The next two sections describe the algorithms used for 
local estimation and local prediction. 


VI. Distributed Estimation 

As described in Section V, in our distributed estimation 
scheme, the local estimator for each submodel A4 l E produces 
a local estimate 0\\yo:k), where xj. C and G k C 6 k . 
Any suitable algorithm may be used for joint state-parameter 
estimation on any of the local subproblems. 

In this paper, we use an unscented Kalman filter (UKF) [25], 
[26] with a variance control algorithm [27] for the estimation 
problems. The UKF assumes the general nonlinear form of 
the state and output equations described in Section II, but 
restricted to additive Gaussian noise. The pump model satisfies 
these constraints. 

We review here the UKF, and refer the reader to [25], [26] 
for details. The UKF approximates a distribution using the 
unscented transform (UT). The UT takes a random variable 
x £ R" x , with mean x and covariance P xx , which is related 
to a second random variable y by some nonlinear function 
y = g(x), and computes the mean y and covariance P yy using 
a (small) set of deterministically selected weighted samples, 
called sigma points [25]. X 1 denotes the /th sigma point from 
x and w 1 denotes its weight. The sigma points are always 
chosen such that the mean and covariance match those of the 
original distribution, x and P xx . Each sigma point is passed 
through g to obtain new sigma points [V, i.e., 

y = g(A") 


with mean and covariance calculated as 


y = J2w i y 

i 

P vy = Y, wi O’ i -y)iy i -y) T - 


We use here the symmetric unscented transform, in which 
2 n x + 1 sigma points are selected symmetrically about the 
mean in the following way: 


w 


X 1 


K 

(■ n x + k) ’ 

2 (n x + k) ’ 


i = 0 

i = 1, ■ . • ,2 n x 


x, 


i = 0 


^+(\A n z+«0 p sx) 
I (^\/ fax T^)P 


; l — 1 ; • • • ) Tt x 
, i — n x T 1 , . . . , 2n x , 


where ^y/(n x + k) P a 
matrix square root of (n a 


refers to the 7th column of the 
+ k)Pxx [26]. The number n is 


a free parameter that can be used to tune the higher order 
moments of the distribution. Note that the sigma point weights 
do not directly represent probabilities, so are not restricted to 
the interval [0,1]. If x is assumed Gaussian, then selecting 
k = 3 — n x is recommended [25]. A smaller value of k will 
bring the sigma points closer together than a larger value. 

In the filter, first, n s sigma points X k _ 1 \ k _ 1 are derived 
from the current mean x k _i\ k _i and covariance estimates 
Pfc_i|fc_i using the sigma point selection algorithm of choice. 
The prediction step is: 

~ i ~ i 

^ k\k— 1 ^\^k— l\k— 1 » u k— l)? ^ 1) • • • 5 n s 

y k \k-i = * = 1 , . . • , 

n s 

Xfeifc-i = 

i 

n s 

fk\k~i = ^w l yi \k~i 

i 

P/c|/c— 1 = Q+ 

n s 

^ v W k\k— 1 — *-k\k— 1){^ k\k — 1 — *k\k-l) 5 

i 

where Q is the process noise covariance matrix. The update 
step is: 

n s 

Pyy = H + 'y' j W l (y 1 k\k_l ~ y k\k-l){y l k\k-l - yk\k-l) T 

i 

n s 

P xy = Y2 W% i X k\k-l - *k\k-l){y l k\k-l - yfc|fe-l) T 

i 

K, — P P 1 

iV /c 1 xy* yy 

Xfc|fc = x- k \k-i + K fc (y fc — y k \k-i) 

Pfc|fc = Pfc|Ic— 1 ^-kPyy^-k ) 

where R is the sensor noise covariance matrix. 

Joint state-parameter estimation can be accomplished in the 
UKF by augmenting the state vector with the unknown pa- 
rameters. The corresponding diagonal elements of the process 
noise matrix, Q, for the parameters G are set to nonzero values. 
In this way, the parameter estimates become time-varying and 
are modified by the filter using the measured outputs. The 
variance values assigned to the parameters determine both the 
rate of parameter estimation convergence and the estimation 
performance once convergence is achieved. Therefore, several 
heuristic approaches have been developed to tune this value 
online to optimize performance, e.g., [2], [27]— [29] . We adopt 
the approach presented in [2], [27], in which the algorithm 
modifies the variance in order to control the variance of the 
parameter estimate to a user-specified range. Note that the 
purpose of the algorithm is to adapt only the (artificial) process 
noise terms associated with the parameters, and process noise 
associated with the states and sensor noise is assumed to be 
known and the associated variance values are not adjusted. 

The algorithm for the adaptation of the variance vector 
associated with 6 , vg, is given as Algorithm 3 (see [27] 
for details), and is called at each time step. We assume that 
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Algorithm 3 vg Adaptation 


Inputs: p(x fc ,0 fc |y O: fc) 

State: ve,k- 1 , 1 «- 1 

Outputs: vg^ 

for all j £ {1,2,..., no} do 

Vj «- RelativeSpread(p(0 fc (j)|y O :fc)) 
if Vj < tj(s(j)) then 
s (j) «- s(i) + 1 

end if 

Vj - v*(s(j)) 


VS.fcO') f- V S ,fc_lO') ( 1 - Pj(s(j)) 

end for 


v*(s(j)) 


ve.fc-l 


Ve fc 


the variance values are tuned initially based on the minimum 
expected EOLs. The adaptation proceeds in stages, maintained 
with the Sj variable for each parameter (with j referring to 
the parameter index). The relative spread is computed as Vj. 
If this value is below the threshold value for the the current 
stage, tj(s(j)), then the stage number is increased. Then the 
new variance v^j : (j) is computed. The error between the 
the actual and the desired spread value for the current stage, 
Vj — v*(s(j)), is normalized by v*(s (j)). This normalized 
error is then multiplied by the proportional gain term for 
the current stage, P ;j (s(j)), and the corresponding variance 
v e,fc-i (j) is increased or decreased by that percentage to 
compute the new variance value 'Vo.k(j)- Tuning of the al- 
gorithm parameters is necessary, but we have found that the 
number of stages Sj = 2 with v* = [50, 10], = [60, 0], 

and P j = [1 x ICG 3 , 1 x 10” 4 ] for all j works well in many 
cases. In the first stage, the variance is kept large to allow for 
convergence, and in the second stage, once convergence has 
begun, the variance is kept small for accurate tracking. 


Algorithm 4 EOL Prediction 


Inputs: 


Outputs: {EOL]} p 
for ;j = 1 to A r do 

k <— kp 
i(j) , i(j ) 

^ x fcp 

o*U) , o*6) 

U k °k P 

Predict u\ 
while Tf OL (x 
Predict uj, 

~p(oi 

GO) 


* 0 ) 
k ’ 


0 L t7 \ufc) = 0 do 


fc+i '~P( x fc+ i' x 
k <— k + 1 


d iU)j 

* 0 ) 

k 


k+l\ u k ) 


3 * 0 ) 


ii) 


,*0| 
oi U) <- 

end while 

EOL^ +- 

end for 


* 0 ) 
'■fc+i 
a*0) 
7 fc+ 1 


many times and the statistics of the global EOL distribution 
are computed. 

Note that prediction for some submodels may complete 
(i.e., simulate all their sigma points to EOL) before others, 
because the damage progression is faster in one submodel 
than in another. To avoid the distributed approach simulating 
beyond the point where a centralized approach would stop, 
we may run the local predictors simultaneously, and terminate 
all predictors whenever the first completes. The unfinished 
samples in the predictors can be ignored, since when taking 
the minimum, they would not be selected anyways. Prediction 
on some submodels may also be avoided altogether if the wear 
rate is clearly dominated by the wear rates on other submodels. 


VII. Distributed Prediction 

Each local prediction module takes as input local state- 
parameter estimates formed from the local estimators, as 
discussed in Section V. Given the mean and covariance 
information, we represent the distribution with a set of sigma 
points derived using the unscented transform. Then, as in [30], 
each sigma point is simulated forward to EOL, and we recover 
the statistics of the EOL distribution given by the sigma points. 

The prediction algorithm is executed for each submodel, 
deriving local EOL predictions using its local threshold func- 
tion. The pseudocode for the prediction procedure is given as 
Algorithm 4 [3]. For a given submodel Ai l P , each sigma point 
j is propagated forward until Teol{*^\ evaluates 

to 1. The algorithm hypothesizes future inputs of the system, 
ii ; r . In this work, we consider only the situation where a 
single future input trajectory is known, because the pump 
in our application undergoes a strict pumping schedule [31]. 
Approaches to handle the case with uncertain future inputs are 
described in [32], [33], 

As discussed in Section V, the global EOL prediction 
is taken as the minimum of the local EOL predictions. To 
compute this, we sample from each local EOL distribution 
and take the minimum of the local samples. This is repeated 


VIII. Results 

We performed a number of simulation-based experiments 
to analyze the performance of the distributed prognostics 
approach compared to a centralized prognostics approach 
for the pump case study. For the distributed approach, we 
implemented the architecture given in Fig. 7. In this section, 
we first provide a demonstration of the approach, followed by 
a summary of a large number of experiments to compare the 
two approaches. We then argue for the improved scalabilty of 
the distributed approach and provide experimental results in 
support of it. 

A. Demonstration of Approach 

Here, we use percent root mean square error (PRMSE) as a 
measure of estimation accuracy, relative accuracy (RA) [34] as 
a measure of prediction accuracy (computed as the difference 
in true and predicted RUL over the true RUL, and expressed 
as a percentage), and RSD as a measure of spread. Each 
prediction metric is averaged over multiple prediction points 
(one every hour of usage) for a single scenario (see [2], [34] 
for the mathematical definitions of the metrics used here). 

As an example scenario, consider the case where iu hu = 1 x 
10“ 3 , Wt = 2 x 10 -11 , and w r = 2.5 x 10 -11 with the nominal 
noise level. Estimation results for the wear parameters for the 
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TABLE II 

Centralized Estimation and Prediction Performance 


n 

PRMSE u , bn 

PRMSE^ 

PRMSE„ r 

RSDiu bn 

RSD mt 

RSD™ r 

RA 

RSD rul 

t 

3.04 

1.90 

3.36 

9.44 

9.55 

9.37 

96.32 

6.95 

10 

3.79 

2.28 

3.97 

9.84 

9.49 

9.56 

96.07 

7.16 

100 

4.15 

2.83 

4.15 

11.11 

9.21 

10.15 

95.26 

7.27 

1000 

3.59 

3.21 

4.50 

11.78 

9.37 

10.78 

94.98 

7.49 


TABLE III 

Distributed Estimation and Prediction Performance 


n 

PRMSEu, bn 

PRMSE^ 

PRMSE™,. 

RSD™ bn 

RSD™ t 

RSD™ r 

RA 


i 

2.98 

1.85 

4.10 

10.39 

10.49 

10.30 

96.21 

8.12 

10 

3.77 

2.35 

5.28 

10.79 

10.42 

10.55 

95.78 

7.99 

100 

4.28 

2.88 

5.69 

11.81 

10.14 

11.02 

95.32 

7.75 

1000 

3.76 

3.55 

5.39 

13.09 

10.23 

12.11 

94.25 

7.99 
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Fig. 8. Centralized estimation results for Wb 0 =1x10 3 ,Wt = 2x10 
and w r = 2.5 X 10 -11 . 


Fig. 9. Distributed estimation results for «)(, = 1 X 10 3 , Wt = 2x 10 
and w r = 2.5 X 10 -11 . 


centralized and distributed approaches are shown in Figs. 8 
and 9, respectively. Note that the minimum and maximum 
values shown are those from the sigma points. Clearly, both 
approaches do very well, and there is no discernible difference 
between the two approaches. Due to the variance control 
algorithm, both approaches converge very quickly to the true 
values of the wear parameters, and remain close to the true 
values with small variance. In both cases, PRMSE for all 
unknown parameters is within 2-3%, with RSD of the wear 
parameters within 9-10% for the centralized case and within 
10-11% for the distributed case. 


For the same scenario, prediction results are given in 
Figs. 10 and 11 for the centralized and distributed approaches, 
respectively, as a- A plots. The a- A metric requires that at a 
given prediction point (A), /3 of the predicted RUL distribution 
must come within a of the true RUL [34]. Here, we use 
a = 0.1 and /3 = 0.5 for all A, i.e., we require that at 
each prediction point, 50% of the distribution lies within 
10% of ground truth. Both approaches pass the test at all 
prediction points, so either approach will obtain the desired 
performance. Figs. 10 and 11 show the result of the test and 
the percentage of the distribution lying within the o-bounds. 
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Fig. 10. Centralized prognosis results for u!(, 0 =1 x10 3 ,wt = 2x10 1 1 . 
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Distributed prognosis results for Wb 0 = 1 x 10 3 , wt = 2 x 10 
= 2.5 X 1CT 11 with a = 0.1 and 0 = 0.5. 


The corresponding RA is 96.90% for the centralized case and 
96.54% for the distributed case, with the RSD of the RUL at 
7.44% for the centralized case and 6.85% for the distributed 
case. 

If T* is not measured (see Section V), both the estimation 
and prediction performance are virtually the same. PRMSE is 
within 2-3%, being only slightly higher (less than 1%) for 
the distributed case. RSD of the wear parameters is again 
within 9-10% for the centralized case and within 10-11% for 
the distributed case. Prediction results are also similar, with 
RA being 96.79% for the centralized case and 96.82% for 
the distributed case, with RSD of the RUL being 7.15% for 
the centralized case and 6.95% for the distributed case. Since 
dropping the T* measurement does not significantly affect 
observability of the system, both centralized and distributed 


prognostics still perform well. 


11. Prognostics Performance 

In a single experiment, combinations of wear parameter 
values were selected randomly within the following ranges: 
[1 x 10" 3 , 3 x 10" 3 ] for w bo , in [1 x KT 11 , 3 x 1CT 11 ] for w t , 
and in [2 x 10 — 11 , 5 x 10 _n ] for w r , such that the maximum 
wear rates corresponded to a minimum EOL of 20 hours. For 
the variance control algorithm, with relative standard deviation 
(RSD) as the measure of spread, we used Sj = 2 with 
v* = [50,10], tj = [60,0], and = [1 x 10“ 3 , 1 x 10~ 4 ] 
for all j, in all experiments. Since the local estimators use 
measured values as inputs, performance will degrade as sensor 
noise is increased, so we varied the sensor noise variance by 
factors of 1, 10, 100, and 1000, to explore this situation. We 
performed 30 experiments for each sensor noise level for both 
the centralized and distributed approaches. We considered the 
case where the future input of the pump is known, in order to 
limit the uncertainty to only that involved in the noise terms 
and that introduced by the filtering algorithms. The pump 
operates at two different RPM values, changing every half 
hour, as shown in Fig. 2. For the pump model, we used a 
first-order discrete-time approximation using a step size of 1 s. 

The averaged estimation and prediction performance results 
are shown in Table II for the centralized approach, and 
Table III for the distributed approach. The column labeled n 
lists the sensor noise variance multipliers. Note that all metrics 
are expressed as percentages. 

We expect that in going from a centralized implementation 
to a distributed implementation there will be some loss of 
performance, due to the information lost in the decomposition, 
but that this performance loss will not be significant. As 
shown in Tables II and III, both the centralized and distributed 
approaches obtain high accuracy and precision, with RA over 
94% and RSD /,’(//, under 9%, and the performance of the 
distributed approach is virtually the same as the centralized 
approach. The distributed approach yields only small decreases 
in prediction accuracy (less than 1%) and small increases in 
spread (less than 1.2%). The decrease in performance of the 
distributed approach is expected, since the local estimators use 
noisy measurement values as inputs. Consequently, estimation 
performance decreases slightly and this translates to decreases 
in prediction performance. The distributed approach also must 
hypothesize the future value of w, which is not completely 
accurate, and therefore also contributing to the slight decrease 
in performance. 

As expected, both approaches perform worse as sensor 
noise increases, but with only small decreases in performance. 
The centralized approach loses 1.34% for RA, whereas the 
distributed approach loses 1.96% for RA. The decrease in 
performance from one noise level to the next higher level is 
larger with the distributed approach since the local estimation 
approach is more sensitive to noise. 

As shown in this specific example and comparing Tables II 
and III, for the pump model, the distributed approach achieves 
prognostics performance virtually identical to the centralized 
approach. Although these results are only empirical, they 
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should extend to other systems as well. For distributed esti- 
mation, the covariance information lost due to the decoupling 
is, in this case (and likely many others), negligible, and is not 
needed for prediction, so only a small loss in performance 
is expected, if any. For distributed prediction, the quality 
of the predictions will depend on the estimation results, so 
errors in estimation will propagate into prediction. Within the 
distributed prediction itself though, there is no information loss 
due to the decomposition. 

C. Computational Efficiency 
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The distributed approach will always yield more efficient 
local estimators and predictors compared to the centralized ap- 
proach, as long as some amount of decomposition is achieved, 
because each local submodel will be smaller than the global 
model, and the complexity of the estimation and prediction 
algorithms is a function of the model size. So, if each 
local estimator (predictor) is implemented on an independent 
processor, the distributed approach will be faster compared to 
the centralized approach on a single processor. 

In particular, for the UKF, the computational complexity 
is polynomial in the state-parameter dimension (0(n 3 ) for 
dimension n [35]). Using the symmetric unscented transform, 
there are 2n + 1 sigma points for a state-parameter vector of 
size n. For the centralized approach, the state-parameter vector 
is of size 11, yielding 23 sigma points. For the distributed 
approach, the state-parameter vectors are of size 3, 3, 3, and 
1, yielding 7, 7, 7, and 3 sigma points, respectively. In the 
implementation for the pump experiments, on average, we find 
that a single local estimator operates around 14% faster. 2 

For the prediction algorithm, using the sigma points as 
the sample set, each local predictor has less samples than 
the global predictor, so, all things being equal, will be faster 
than the centralized predictor. Overall computational efficiency 
depends also on the spread of the samples, i.e., a sample with 
a wear rate value closer to zero will take longer to simulate to 
EOL than one with a larger wear rate value, given the same 
inputs [30]. If the distributed approach can achieve the same 
spread, then, it should be faster than the centralized approach 
since less samples are simulated forward. For the pump, the 
centralized approach uses the 24 sigma points obtained by the 
global estimator. The distributed approach has two submodels 
with state-parameter vectors of size 3 and 7, yielding 7 and 
15 sigma points. The centralized and distributed approaches 
do achieve approximately the same amount of spread (e.g., 
compare Figs. 8 and 9), and we find that the distributed 
approach is, on average, about 15% faster. 


D. Scalability 

As the size of the system increases, we expect that the 
computational cost of the distributed approach grows more 

2 We find the performance gain of the distributed approach smaller than 
expected due to both the overhead associated with the UKF implementation, 

and the fact that the implementation platform, MATLAB®, is very efficient at 
matrix multiplications; for relatively small matrices the performance difference 
is quite small. 
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Fig. 12. Scalability of centralized and distributed approaches. 
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slowly than that of the centralized approach, i.e., the dis- 
tributed approach is more scalable. For the case of UKF 
estimation, assume we have the least possible decomposition, 
where for global model dimension n the largest local submodel 
has dimension n — 1. Then the complexities are 0(n 3 ) and 
0((n — l) 3 ), and if the size of the system increases by 
one state, the complexities are 0((n + l) 3 ) and 0((n) 3 ), 
and the complexity of the centralized approach grows by 
a larger margin than the distributed approach, therefore the 
distributed approach is more scalable. The argument is similar 
for distributed prediction. 

The improved scalability is confirmed experimentally also. 
As a large-scale system we adopted a system consisting of 
n pumps. In this case, the decomposition results remain the 
same, where each pump results in 4 submodels for estimation 
and 2 submodels for prediction. For example, for a 5-pump 
system, there are 20 submodels for estimation and 10 for 
prediction. The scalability results for estimation are shown 
in Fig. 12a. As the size of the system increases, the amount 
of computation time required per step increases exponentially 
for the centralized approach. The distributed approach, on the 
other hand, stays constant, because even though there are 
more submodels, they are all operating in parallel. Even if 
the distributed approach was implemented sequentially on a 
single processor, the total amount of computation would grow 
only linearly, since each new pump adds 4 new submodels of 
fixed size. The scalability results for prediction are shown in 
Fig. 12b, and the results are similar to the case for estimation. 
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IX. Related Work 

Model-based prognostics approaches have been developed 
previously and applied to other components and fault modes, 
such as batteries [4], [36], fatigue cracks [37], [38], and 
automotive suspension systems [5]. Most model-based ap- 
proaches are based on using filters for state estimation. Kalman 
filters have been used for prognostics of electrolytic capacitors 
in [39]. A model -based prognostics methodology is developed 
in [5] using an interacting multiple model filter for state- 
parameter estimation and prediction. An application of the 
approach of [5] to a centrifugal pump is developed in [23], 
but considers only a single degradation mode. Particle filters 
have been the most popular and have been used in [4], [37], 
[38], [40], [41], among others. 

Some distributed prognostics approaches have also been 
explored. A distributed prognostics approach based on particle 
filters is developed in [15], and one based on Gaussian 
process regression in [16]. In contrast to our approach, these 
approaches still solve the global problem, and distribute only 
the computation. We propose a fundamentally different and 
novel distributed architecture, in which the global problem 
is decomposed into subproblems that can be solved inde- 
pendently and computation trivially distributed. This type of 
architecture is favorable, because there may be parts of the 
global problem that are not relevant to prognostics, and do not 
need to be solved (e.g., estimation of lj in the pump model). In 
a global approach where only the computation is distributed, 
these parts of the problem are still being solved. The local 
subproblems themselves can then be solved in a distributed 
fashion using an approach such as that described in [15]. 

The idea of using model decomposition to distribute state 
and parameter estimation is not new. Subspace methods [42], 
[43] have been used for solving identification problems in 
large dimensional systems by employing QR-factorization 
and singular-value decomposition [44]. These methods have 
been successfully used for linear systems, but face robust- 
ness problems when applied to nonlinear systems. Moreover, 
methods to automatically derive the decomposition directly 
from the system model have not been proposed. Regarding 
structural model decomposition, in [9], Williams and Millar 
propose an approach for decomposing a system model into 
smaller hierarchically organized subsystems, called dissents, 
applied to learning problems. Similar techniques, like Analyti- 
cal Redundancy Relations (ARRs) [45] and Possible Conflicts 
(PCs) [10], both used for diagnosis, are also based on the 
idea of model decomposition. Dissents, ARRs and PCs are 
all conceptually equivalent [10]. PCs have been previously 
applied to generate a more robust and computationally simpler 
parameter estimation approach for fault identification [18]. 
Simulation results in that case showed an improvement in 
estimation accuracy while having a faster convergence to true 
solutions. Similar work was proposed in [46] using a dynamic 
Bayesian network (DBN) modeling framework, in which an 
automatic approach for model decomposition into submodels 
based on structural observability was developed for efficient 
state estimation and fault identification. We instead use a 
more general model decomposition framework, distributing 


the estimation problem in a way similar to these previous 
approaches, but distributing also the prediction problem in a 
novel way. 

X. Conclusions 

In this paper, we developed a novel distributed model-based 
prognostics approach based on structural model decomposi- 
tion. The global model of a system is decomposed into a set of 
local submodels, from which independent local estimation and 
prediction problems are posed to solve the global prognostics 
problem in a distributed fashion that scales well. We applied a 
general model decomposition framework to generate minimal 
submodels for estimation and prediction. The local estimators 
compute local state-parameter estimates that define the system 
health state, and this information is used as an input to the local 
predictors, which compute EOL and RUL predictions for their 
submodels. The system EOL prediction can then be formed 
as the minimum of the local EOL predictions. 

A centrifugal pump model was used for a simulation- 
based case study, demonstrating that, for all practical purposes, 
the distributed scheme has identical prognostics performance 
to the centralized scheme. Minor decreases in performance 
are observed, as expected, since the distributed scheme de- 
composes the models by using noisy sensor values as local 
submodel inputs, however, the impact was not significant and 
did not increase significantly as sensor noise increased. The 
distributed approach also offers improvements in computa- 
tional efficiency and scalability in both the estimation and 
prediction steps. 

In future work, we will apply this framework to system-level 
prognosis of large-scale systems. Further, the amount of model 
decomposition that can be achieved, for the estimation prob- 
lem, is dependent on the number of sensors and where they 
are placed, so algorithms are needed for optimal placement of 
sensors to achieve the best model decompositions, and, hence, 
the best decomposition of the prognostics problem. 
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